Overview

Dataset statistics

Number of variables12
Number of observations4898
Missing cells0
Missing cells (%)0.0%
Duplicate rows772
Duplicate rows (%)15.8%
Total size in memory459.3 KiB
Average record size in memory96.0 B

Variable types

Numeric12

Alerts

Dataset has 772 (15.8%) duplicate rowsDuplicates
residual sugar is highly correlated with densityHigh correlation
chlorides is highly correlated with density and 1 other fieldsHigh correlation
free sulfur dioxide is highly correlated with total sulfur dioxideHigh correlation
total sulfur dioxide is highly correlated with free sulfur dioxide and 1 other fieldsHigh correlation
density is highly correlated with residual sugar and 3 other fieldsHigh correlation
alcohol is highly correlated with chlorides and 1 other fieldsHigh correlation
residual sugar is highly correlated with densityHigh correlation
free sulfur dioxide is highly correlated with total sulfur dioxideHigh correlation
total sulfur dioxide is highly correlated with free sulfur dioxide and 1 other fieldsHigh correlation
density is highly correlated with residual sugar and 2 other fieldsHigh correlation
alcohol is highly correlated with densityHigh correlation
residual sugar is highly correlated with densityHigh correlation
density is highly correlated with residual sugar and 1 other fieldsHigh correlation
alcohol is highly correlated with densityHigh correlation
residual sugar is highly correlated with densityHigh correlation
free sulfur dioxide is highly correlated with total sulfur dioxideHigh correlation
total sulfur dioxide is highly correlated with free sulfur dioxideHigh correlation
density is highly correlated with residual sugar and 1 other fieldsHigh correlation
alcohol is highly correlated with densityHigh correlation

Reproduction

Analysis started2021-12-30 06:54:42.184788
Analysis finished2021-12-30 06:55:12.190023
Duration30.01 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

fixed acidity
Real number (ℝ≥0)

Distinct68
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.854787668
Minimum3.8
Maximum14.2
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum3.8
5-th percentile5.6
Q16.3
median6.8
Q37.3
95-th percentile8.3
Maximum14.2
Range10.4
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8438682277
Coefficient of variation (CV)0.1231063993
Kurtosis2.172178465
Mean6.854787668
Median Absolute Deviation (MAD)0.5
Skewness0.6477514746
Sum33574.75
Variance0.7121135857
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.8308
 
6.3%
6.6290
 
5.9%
6.4280
 
5.7%
6.9241
 
4.9%
6.7236
 
4.8%
7232
 
4.7%
6.5225
 
4.6%
7.2206
 
4.2%
7.1200
 
4.1%
7.4194
 
4.0%
Other values (58)2486
50.8%
ValueCountFrequency (%)
3.81
 
< 0.1%
3.91
 
< 0.1%
4.22
 
< 0.1%
4.43
 
0.1%
4.51
 
< 0.1%
4.61
 
< 0.1%
4.75
 
0.1%
4.89
 
0.2%
4.97
 
0.1%
524
0.5%
ValueCountFrequency (%)
14.21
 
< 0.1%
11.81
 
< 0.1%
10.72
 
< 0.1%
10.32
 
< 0.1%
10.21
 
< 0.1%
103
 
0.1%
9.92
 
< 0.1%
9.88
0.2%
9.74
0.1%
9.65
0.1%

volatile acidity
Real number (ℝ≥0)

Distinct125
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2782411188
Minimum0.08
Maximum1.1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum0.08
5-th percentile0.15
Q10.21
median0.26
Q30.32
95-th percentile0.46
Maximum1.1
Range1.02
Interquartile range (IQR)0.11

Descriptive statistics

Standard deviation0.1007945484
Coefficient of variation (CV)0.3622561211
Kurtosis5.091625817
Mean0.2782411188
Median Absolute Deviation (MAD)0.06
Skewness1.576979503
Sum1362.825
Variance0.01015954099
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.28263
 
5.4%
0.24253
 
5.2%
0.26240
 
4.9%
0.25231
 
4.7%
0.22229
 
4.7%
0.27218
 
4.5%
0.23216
 
4.4%
0.2214
 
4.4%
0.3198
 
4.0%
0.21191
 
3.9%
Other values (115)2645
54.0%
ValueCountFrequency (%)
0.084
 
0.1%
0.0851
 
< 0.1%
0.091
 
< 0.1%
0.16
 
0.1%
0.1056
 
0.1%
0.1113
 
0.3%
0.1153
 
0.1%
0.1234
0.7%
0.1253
 
0.1%
0.1344
0.9%
ValueCountFrequency (%)
1.11
< 0.1%
1.0051
< 0.1%
0.9651
< 0.1%
0.931
< 0.1%
0.911
< 0.1%
0.9051
< 0.1%
0.851
< 0.1%
0.8151
< 0.1%
0.7851
< 0.1%
0.781
< 0.1%

citric acid
Real number (ℝ≥0)

Distinct87
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3341915067
Minimum0
Maximum1.66
Zeros19
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum0
5-th percentile0.17
Q10.27
median0.32
Q30.39
95-th percentile0.54
Maximum1.66
Range1.66
Interquartile range (IQR)0.12

Descriptive statistics

Standard deviation0.1210198042
Coefficient of variation (CV)0.362127109
Kurtosis6.174900657
Mean0.3341915067
Median Absolute Deviation (MAD)0.06
Skewness1.281920398
Sum1636.87
Variance0.01464579301
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.3307
 
6.3%
0.28282
 
5.8%
0.32257
 
5.2%
0.34225
 
4.6%
0.29223
 
4.6%
0.26219
 
4.5%
0.27216
 
4.4%
0.49215
 
4.4%
0.31200
 
4.1%
0.33183
 
3.7%
Other values (77)2571
52.5%
ValueCountFrequency (%)
019
0.4%
0.017
 
0.1%
0.026
 
0.1%
0.032
 
< 0.1%
0.0412
0.2%
0.055
 
0.1%
0.066
 
0.1%
0.0712
0.2%
0.084
 
0.1%
0.0912
0.2%
ValueCountFrequency (%)
1.661
 
< 0.1%
1.231
 
< 0.1%
15
0.1%
0.991
 
< 0.1%
0.912
 
< 0.1%
0.881
 
< 0.1%
0.861
 
< 0.1%
0.822
 
< 0.1%
0.812
 
< 0.1%
0.82
 
< 0.1%

residual sugar
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct310
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.391414863
Minimum0.6
Maximum65.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum0.6
5-th percentile1.1
Q11.7
median5.2
Q39.9
95-th percentile15.7
Maximum65.8
Range65.2
Interquartile range (IQR)8.2

Descriptive statistics

Standard deviation5.072057784
Coefficient of variation (CV)0.7935735502
Kurtosis3.469820103
Mean6.391414863
Median Absolute Deviation (MAD)3.6
Skewness1.077093756
Sum31305.15
Variance25.72577016
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.2187
 
3.8%
1.4184
 
3.8%
1.6165
 
3.4%
1.3147
 
3.0%
1.1146
 
3.0%
1.5142
 
2.9%
1.899
 
2.0%
1.799
 
2.0%
193
 
1.9%
279
 
1.6%
Other values (300)3557
72.6%
ValueCountFrequency (%)
0.62
 
< 0.1%
0.77
 
0.1%
0.825
 
0.5%
0.939
 
0.8%
0.954
 
0.1%
193
1.9%
1.051
 
< 0.1%
1.1146
3.0%
1.153
 
0.1%
1.2187
3.8%
ValueCountFrequency (%)
65.81
< 0.1%
31.62
< 0.1%
26.052
< 0.1%
23.51
< 0.1%
22.61
< 0.1%
222
< 0.1%
20.82
< 0.1%
20.72
< 0.1%
20.41
< 0.1%
20.31
< 0.1%

chlorides
Real number (ℝ≥0)

HIGH CORRELATION

Distinct160
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.04577235606
Minimum0.009
Maximum0.346
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum0.009
5-th percentile0.027
Q10.036
median0.043
Q30.05
95-th percentile0.067
Maximum0.346
Range0.337
Interquartile range (IQR)0.014

Descriptive statistics

Standard deviation0.02184796809
Coefficient of variation (CV)0.4773179703
Kurtosis37.56459971
Mean0.04577235606
Median Absolute Deviation (MAD)0.007
Skewness5.023330683
Sum224.193
Variance0.0004773337098
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.044201
 
4.1%
0.036200
 
4.1%
0.042184
 
3.8%
0.04182
 
3.7%
0.046181
 
3.7%
0.048174
 
3.6%
0.047171
 
3.5%
0.05170
 
3.5%
0.045170
 
3.5%
0.034168
 
3.4%
Other values (150)3097
63.2%
ValueCountFrequency (%)
0.0091
 
< 0.1%
0.0121
 
< 0.1%
0.0131
 
< 0.1%
0.0144
 
0.1%
0.0154
 
0.1%
0.0165
 
0.1%
0.0175
 
0.1%
0.01810
0.2%
0.0199
0.2%
0.0216
0.3%
ValueCountFrequency (%)
0.3461
< 0.1%
0.3011
< 0.1%
0.291
< 0.1%
0.2711
< 0.1%
0.2551
< 0.1%
0.2441
< 0.1%
0.241
< 0.1%
0.2391
< 0.1%
0.2171
< 0.1%
0.2121
< 0.1%

free sulfur dioxide
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct132
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.30808493
Minimum2
Maximum289
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum2
5-th percentile11
Q123
median34
Q346
95-th percentile63
Maximum289
Range287
Interquartile range (IQR)23

Descriptive statistics

Standard deviation17.00713733
Coefficient of variation (CV)0.4816782716
Kurtosis11.46634243
Mean35.30808493
Median Absolute Deviation (MAD)11
Skewness1.406744921
Sum172939
Variance289.24272
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
29160
 
3.3%
31132
 
2.7%
26129
 
2.6%
35129
 
2.6%
34128
 
2.6%
36127
 
2.6%
24118
 
2.4%
28112
 
2.3%
33112
 
2.3%
37111
 
2.3%
Other values (122)3640
74.3%
ValueCountFrequency (%)
21
 
< 0.1%
310
 
0.2%
411
 
0.2%
525
0.5%
632
0.7%
725
0.5%
835
0.7%
929
0.6%
1055
1.1%
1145
0.9%
ValueCountFrequency (%)
2891
< 0.1%
146.51
< 0.1%
138.51
< 0.1%
1311
< 0.1%
1281
< 0.1%
1241
< 0.1%
122.51
< 0.1%
118.51
< 0.1%
1121
< 0.1%
1101
< 0.1%

total sulfur dioxide
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct251
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean138.3606574
Minimum9
Maximum440
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum9
5-th percentile75
Q1108
median134
Q3167
95-th percentile212
Maximum440
Range431
Interquartile range (IQR)59

Descriptive statistics

Standard deviation42.49806455
Coefficient of variation (CV)0.3071542543
Kurtosis0.5718532334
Mean138.3606574
Median Absolute Deviation (MAD)29
Skewness0.3907098417
Sum677690.5
Variance1806.085491
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11169
 
1.4%
11361
 
1.2%
11757
 
1.2%
11855
 
1.1%
12854
 
1.1%
12254
 
1.1%
11454
 
1.1%
15054
 
1.1%
12453
 
1.1%
14052
 
1.1%
Other values (241)4335
88.5%
ValueCountFrequency (%)
91
 
< 0.1%
101
 
< 0.1%
182
< 0.1%
191
 
< 0.1%
211
 
< 0.1%
243
0.1%
251
 
< 0.1%
261
 
< 0.1%
284
0.1%
292
< 0.1%
ValueCountFrequency (%)
4401
< 0.1%
366.51
< 0.1%
3441
< 0.1%
3131
< 0.1%
307.51
< 0.1%
3031
< 0.1%
2941
< 0.1%
2821
< 0.1%
2722
< 0.1%
2601
< 0.1%

density
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct890
Distinct (%)18.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9940273765
Minimum0.98711
Maximum1.03898
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum0.98711
5-th percentile0.9896385
Q10.9917225
median0.99374
Q30.9961
95-th percentile0.999
Maximum1.03898
Range0.05187
Interquartile range (IQR)0.0043775

Descriptive statistics

Standard deviation0.002990906917
Coefficient of variation (CV)0.003008877811
Kurtosis9.793806911
Mean0.9940273765
Median Absolute Deviation (MAD)0.00214
Skewness0.9777730049
Sum4868.74609
Variance8.945524186 × 10-6
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.99264
 
1.3%
0.992861
 
1.2%
0.993253
 
1.1%
0.99352
 
1.1%
0.993450
 
1.0%
0.993849
 
1.0%
0.992747
 
1.0%
0.994446
 
0.9%
0.994845
 
0.9%
0.995444
 
0.9%
Other values (880)4387
89.6%
ValueCountFrequency (%)
0.987111
< 0.1%
0.987131
< 0.1%
0.987221
< 0.1%
0.98741
< 0.1%
0.987422
< 0.1%
0.987462
< 0.1%
0.987581
< 0.1%
0.987741
< 0.1%
0.987791
< 0.1%
0.987942
< 0.1%
ValueCountFrequency (%)
1.038981
< 0.1%
1.01032
< 0.1%
1.002952
< 0.1%
1.002411
< 0.1%
1.00241
< 0.1%
1.001961
< 0.1%
1.001821
< 0.1%
1.00172
< 0.1%
1.00121
< 0.1%
1.001181
< 0.1%

pH
Real number (ℝ≥0)

Distinct103
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.188266639
Minimum2.72
Maximum3.82
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum2.72
5-th percentile2.96
Q13.09
median3.18
Q33.28
95-th percentile3.46
Maximum3.82
Range1.1
Interquartile range (IQR)0.19

Descriptive statistics

Standard deviation0.1510005996
Coefficient of variation (CV)0.04736134605
Kurtosis0.5307749515
Mean3.188266639
Median Absolute Deviation (MAD)0.1
Skewness0.4577825459
Sum15616.13
Variance0.02280118108
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.14172
 
3.5%
3.16164
 
3.3%
3.22146
 
3.0%
3.19145
 
3.0%
3.18138
 
2.8%
3.2137
 
2.8%
3.08136
 
2.8%
3.15136
 
2.8%
3.1135
 
2.8%
3.12134
 
2.7%
Other values (93)3455
70.5%
ValueCountFrequency (%)
2.721
 
< 0.1%
2.741
 
< 0.1%
2.771
 
< 0.1%
2.793
 
0.1%
2.83
 
0.1%
2.821
 
< 0.1%
2.834
0.1%
2.841
 
< 0.1%
2.859
0.2%
2.869
0.2%
ValueCountFrequency (%)
3.821
 
< 0.1%
3.811
 
< 0.1%
3.82
< 0.1%
3.791
 
< 0.1%
3.772
< 0.1%
3.762
< 0.1%
3.752
< 0.1%
3.742
< 0.1%
3.723
0.1%
3.71
 
< 0.1%

sulphates
Real number (ℝ≥0)

Distinct79
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4898468763
Minimum0.22
Maximum1.08
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum0.22
5-th percentile0.34
Q10.41
median0.47
Q30.55
95-th percentile0.71
Maximum1.08
Range0.86
Interquartile range (IQR)0.14

Descriptive statistics

Standard deviation0.1141258339
Coefficient of variation (CV)0.2329826717
Kurtosis1.59092963
Mean0.4898468763
Median Absolute Deviation (MAD)0.07
Skewness0.9771936833
Sum2399.27
Variance0.01302470597
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.5249
 
5.1%
0.46225
 
4.6%
0.44216
 
4.4%
0.38214
 
4.4%
0.42181
 
3.7%
0.48179
 
3.7%
0.45178
 
3.6%
0.47172
 
3.5%
0.4168
 
3.4%
0.54167
 
3.4%
Other values (69)2949
60.2%
ValueCountFrequency (%)
0.221
 
< 0.1%
0.231
 
< 0.1%
0.254
 
0.1%
0.264
 
0.1%
0.2713
 
0.3%
0.2813
 
0.3%
0.2916
 
0.3%
0.331
0.6%
0.3135
0.7%
0.3254
1.1%
ValueCountFrequency (%)
1.081
 
< 0.1%
1.061
 
< 0.1%
1.011
 
< 0.1%
11
 
< 0.1%
0.991
 
< 0.1%
0.986
0.1%
0.971
 
< 0.1%
0.963
0.1%
0.955
0.1%
0.942
 
< 0.1%

alcohol
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct103
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.51426705
Minimum8
Maximum14.2
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum8
5-th percentile8.9
Q19.5
median10.4
Q311.4
95-th percentile12.7
Maximum14.2
Range6.2
Interquartile range (IQR)1.9

Descriptive statistics

Standard deviation1.230620568
Coefficient of variation (CV)0.1170429248
Kurtosis-0.6984253278
Mean10.51426705
Median Absolute Deviation (MAD)1
Skewness0.4873419932
Sum51498.88
Variance1.514426982
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.4229
 
4.7%
9.5228
 
4.7%
9.2199
 
4.1%
9185
 
3.8%
10162
 
3.3%
10.5160
 
3.3%
11158
 
3.2%
10.4153
 
3.1%
9.1144
 
2.9%
9.8136
 
2.8%
Other values (93)3144
64.2%
ValueCountFrequency (%)
82
 
< 0.1%
8.43
 
0.1%
8.59
 
0.2%
8.623
 
0.5%
8.778
 
1.6%
8.8107
2.2%
8.995
1.9%
9185
3.8%
9.1144
2.9%
9.2199
4.1%
ValueCountFrequency (%)
14.21
 
< 0.1%
14.051
 
< 0.1%
145
 
0.1%
13.93
 
0.1%
13.82
 
< 0.1%
13.77
 
0.1%
13.69
0.2%
13.551
 
< 0.1%
13.512
0.2%
13.420
0.4%

quality
Real number (ℝ≥0)

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.877909351
Minimum3
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.4 KiB

Quantile statistics

Minimum3
5-th percentile5
Q15
median6
Q36
95-th percentile7
Maximum9
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.885638575
Coefficient of variation (CV)0.1506723772
Kurtosis0.2165258272
Mean5.877909351
Median Absolute Deviation (MAD)1
Skewness0.1557963977
Sum28790
Variance0.7843556855
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
62198
44.9%
51457
29.7%
7880
18.0%
8175
 
3.6%
4163
 
3.3%
320
 
0.4%
95
 
0.1%
ValueCountFrequency (%)
320
 
0.4%
4163
 
3.3%
51457
29.7%
62198
44.9%
7880
18.0%
8175
 
3.6%
95
 
0.1%
ValueCountFrequency (%)
95
 
0.1%
8175
 
3.6%
7880
18.0%
62198
44.9%
51457
29.7%
4163
 
3.3%
320
 
0.4%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
07.00.270.3620.70.04545.0170.01.00103.000.458.86
16.30.300.341.60.04914.0132.00.99403.300.499.56
28.10.280.406.90.05030.097.00.99513.260.4410.16
37.20.230.328.50.05847.0186.00.99563.190.409.96
47.20.230.328.50.05847.0186.00.99563.190.409.96
58.10.280.406.90.05030.097.00.99513.260.4410.16
66.20.320.167.00.04530.0136.00.99493.180.479.66
77.00.270.3620.70.04545.0170.01.00103.000.458.86
86.30.300.341.60.04914.0132.00.99403.300.499.56
98.10.220.431.50.04428.0129.00.99383.220.4511.06

Last rows

fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
48886.80.2200.361.200.05238.0127.00.993303.040.549.25
48894.90.2350.2711.750.03034.0118.00.995403.070.509.46
48906.10.3400.292.200.03625.0100.00.989383.060.4411.86
48915.70.2100.320.900.03838.0121.00.990743.240.4610.66
48926.50.2300.381.300.03229.0112.00.992983.290.549.75
48936.20.2100.291.600.03924.092.00.991143.270.5011.26
48946.60.3200.368.000.04757.0168.00.994903.150.469.65
48956.50.2400.191.200.04130.0111.00.992542.990.469.46
48965.50.2900.301.100.02220.0110.00.988693.340.3812.87
48976.00.2100.380.800.02022.098.00.989413.260.3211.86

Duplicate rows

Most frequently occurring

fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality# duplicates
4237.00.150.2814.70.05129.0149.00.997922.960.399.078
5577.30.190.2713.90.05745.0155.00.998072.940.418.888
3356.80.180.3012.80.06219.0171.00.998083.000.529.077
5897.40.160.3013.70.05633.0168.00.998252.900.448.777
5887.40.160.2715.50.05025.0135.00.998402.900.438.776
5927.40.190.3012.80.05348.5229.00.998603.140.499.176
5937.40.190.3114.50.04539.0193.00.998603.100.509.266
6417.60.200.3014.20.05653.0212.50.999003.140.468.986
285.70.220.2016.00.04441.0113.00.998623.220.468.965
1106.20.230.3617.20.03937.0130.00.999463.230.438.865